08. Calculate a Covariance Matrix

Calculate a Covariance Matrix

Remember how we defined the covariance matrix:

\mathbf{P} = \begin{bmatrix} \mathrm{Cov}(r_A,r_A) & \mathrm{Cov}(r_A,r_B)\\ \mathrm{Cov}(r_B,r_A) & \mathrm{Cov}(r_B,r_B) \end{bmatrix}.

And covariance is

\mathrm{Cov}(r_A,r_B) = \mathrm{E}[(r_A - \bar{r}_A)(r_B - \bar{r}_B)].

If r_A and r_B are discrete vectors of values, that is, they can take on the values (r_{Ai}, r_{Bi}) for i = 1,\ldots, n , with equal probabilities 1/n, then the covariance can be equivalently written,

= \frac{1}{n-1}\sum_{i=1}^n(r_{Ai} - \bar{r}_A)(r_{Bi} - \bar{r}_B).

We use n-1 in the denominator of the constant for the same reason that we use n-1 in the denominator of the constant out front in the sample standard deviation—because we have a sample, and we want to calculate an unbiased estimate of the population covariance.

But if \bar{r}_A = \bar{r}_B = 0 , then the covariance equals

= \frac{1}{n-1}\sum_{i=1}^nr_{Ai}r_{Bi}.

In matrix notation, this equals

\frac{1}{n-1}\mathbf{r}_A^\mathrm{T}\mathbf{r}_B.

Therefore, if \mathbf{r} is a matrix that contains the vectors \mathbf{r}_A and \mathbf{r}_B as its columns,

\mathbf{r} = \begin{bmatrix} \vdots & \vdots\\ \mathbf{r}_A & \mathbf{r}_B\\ \vdots & \vdots \end{bmatrix},

then

\mathbf{r}^\mathrm{T}\mathbf{r} = \begin{bmatrix} \cdots & \mathbf{r}_A & \cdots\ \cdots & \mathbf{r}_B & \cdots \end{bmatrix} \begin{bmatrix} \vdots & \vdots\ \mathbf{r}_A & \mathbf{r}_B\ \vdots & \vdots \end{bmatrix} = \begin{bmatrix} \mathbf{r}_A^\mathrm{T}\mathbf{r}_A & \mathbf{r}_A^\mathrm{T}\mathbf{r}_B\ \mathbf{r}_B^\mathrm{T}\mathbf{r}_A & \mathbf{r}_B^\mathrm{T}\mathbf{r}_B \end{bmatrix}.

So if each vector of observations in your data matrix has mean 0, you can calculate the covariance matrix as:

\frac{1}{n-1}\mathbf{r}^\mathrm{T}\mathbf{r}